perm filename RESULT[DIS,DBL] blob sn#229989 filedate 1976-08-11 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00013 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.NSECP(Results)
C00005 00003	.SSEC(What AM Did)
C00011 00004	. SSSEC(Linear Task-by-task Summary of a Good Run)
C00038 00005	.  SKIP TO COLUMN 1  SSSEC(Two-Dimensional Behavior Graph)
C00045 00006	.  SKIP TO COLUMN 1 SSSEC(AM as a Computer Program)
C00052 00007	.SSEC(Experiments with AM)
C00062 00008	. SSSEC(Must the Worth numbers be finely tuned?)
C00072 00009	. SSSEC(How finely tuned is the Agenda?)
C00080 00010	. SSSEC(How valuable is tacking reasons onto each taskα?)
C00085 00011	. SSSEC(What if certain concepts are eliminated/added?)
C00091 00012	. SSSEC(What if certain heuristics are tampered with?)
C00094 00013	. SKIP TO COLUMN 1 SSSEC(Can AM work in a new domainα: Plane Geometry?)
C00102 ENDMK
C⊗;
.NSECP(Results)

.R1PAGE: PAGE;

This chapter opens  by summarizing what  AM "did". Section 1  gives a
fairly high-level description of the major paths which were explored,
the concepts discovered along  the way, the relationships which  were
noticed, and occasionally the ones which "should" have been but but weren't.

.ONCE TURN ON "{}"

The next section 
({SECNUM}.2)
continues this  exposition by presenting the results
of experiments which were done with (and ⊗4on⊗*) AM.

.ONCE TURN ON "{}"

Chapter {[2] EVALU} will draw upon these results -- and others given
in the appendices -- to form conclusions about AM. Several meta-level
questions will be tackled there (e.g., "What are AM's limitations?").

.RESULT: SECNUM ;
.SSEC(What AM Did)

.QQ

Now we have seen that mathematical work is not simply mechanical, that it
could not be done by a machine, however perfect. It is not merely a question
of applying rules, of making the  most combinations possible according to certain
fixed laws. The combinations so obtained would be exceedingly numerous, useless,
and cumbersome. The true work of the inventor consists in choosing among these combinations
so as to eliminate the useless ones or rather to avoid the trouble of making them,
and the rules which must guide this choice are extremely fine and delicate. It is
almost impossible to state them precisely; they are felt rather than formulated.
Under these conditions, how imagine a sieve capable of applying them mechanically?

-- Poincare'

.ESS


AM is both a mathematician of sorts, and a big computer program.

.BEGIN TURN ON "{}"

By granting  AM more anthropomorphic  qualities than it  deserves, we
can  describe  its  progress  through  elementary  mathematics.    It
rediscovered many  well-known  concepts,  a  couple  interesting  but
not-generally-known ones,  and several  concepts which  were hitherto
unknown    and   should    have   stayed    that   way.       Section
{[2]OVERV}.{[2]AMAMSSEC}, on page {[2]AMAM}, recaps what AM did, much
as  a  historian might  critically  evaluate Euler's  work.   A  more
detailed prose description of everything AM did is found in  Appendix
{[2]TRACES}.{[2]PROS}, beginning on page {[3]PROSP}.

Instead of repeating  any of this descriptive prose here, Section
{SECNUM}.{SSECNUM}.1
will  provide a
very brief listing of what AM did in a single good run, task by task.
A much more detailed version of  this same list is found in  Appendix
{[2]TRACES}.{[2]TRATS}, beginning on page {[3]TRAT}. The task numbers
there  correspond to the numbering below$$  They do ⊗4not⊗* precisely
match the  task numbers  accompanying  the example  given in  Chapter
{[2]EXAM1}. $.
These task-by-task listings are not 
complete listings of every task
AM ever attempted in any of its many runs, 
but rather a trace of a single, better-than-average run of
the program.$$ In fact,  
it is perhaps the best overall run. It occurred in two stages (due to
space problems; unimportant). In this particular run, AM 
misses the few "very best"  discoveries it ever made,
since the runs they occurred in went in somewhat different directions. It
also omits some of the more boring tasks: see, e.g.,  the description
of task number {[3]OMITT}. $ The reader may wish to consult the brief
alphabetized glossary of concept names in the last chapter (page {[3]BRIEFC}),
or the more detailed appendix of concept descriptions (following page
{[3]ENDINDEXCP}).

Following this linear trace of AM's behavior is a more appropriate representation
of what it did: namely, a two-dimensional graph of that
same behavior as seen in "concept-space".
This forms Section
{SECNUM}.{SSECNUM}.2, and is found on page {[3]PBG}.

By under-estimating  AM's sophistication,  one can demand  answers to
the typical questions to ask about a computer program: how big is it,
how much cpu time does it use, what language it's coded in, etc.  These
are   
found in   Section
{SECNUM}.{SSECNUM}.3.

.END

. SSSEC(Linear Task-by-task Summary of a Good Run)

.SHORTASKP: PAGE;

.SHORTASK: SSECNUM;

.BN; INDENT 3,7,0;  ONCE PREFACE 2;

λλ Fill in examples of Compose. Failed, but suggested next task:

λλ Fill in examples of Set-union. Also failed, but suggested:

λλ Fill  in  examples of  Sets. Many  found  (e.g., by  instantiating
Set.Defn) and then more derived from those examples (e.g., by running
Union.Alg).

.SETTASK: BNN;

λλ Fill in specializations of Sets (because it was very easy  to find
examples  of  Sets). Creation  of  new  concepts. One,  INT-Sets,  is
related to  "Singletons".  Another, "BI-Sets", is all nests of braces
(no atomic elements).

λλ Fill in examples of INT-Sets. This indirectly led to a rise in the
worth of Equal.

λλ Check all examples of INT-Sets. All were confirmed.
AM defines the set of Nonempty INT-Sets; this is renamed "Singletons" by the user.

λλ Check  all examples of  Sets.  To  check a couple  conjectures, AM
will soon look for Bags and Osets.

λλ Fill in examples of Bags.

λλ Fill in specializations of Bags. Created INT-Bags (contain just one kind of
element), and BI-Bags (nests of parentheses).

λλ Fill in examples of Osets.

λλ Check examples of Osets.

λλ Fill in examples of Lists.

λλ Check examples of Lists.

λλ Fill in examples of All-but-first.

λλ Fill in examples of All-but-last.

λλ Fill in specializations of All-but-last. Failed.

λλ Fill in examples of List-union.

λλ Fill in examples of Proj1.

λλ Check examples of All-but-first.

λλ Check examples of All-but-last.

λλ Fill in examples of Proj2.

λλ Fill in examples of Empty-structures. 4 found.

λλ Fill in generalizations of Empty-structures. Failed.

λλ Check examples of List-union.

λλ Check examples of Bags. Defined Singleton-bags.

λλ Fill in examples of Bag-union.

λλ Check examples of Proj2.

λλ Fill in examples of Set-union.

λλ Check examples of Set-union. Define λ (x,y) x∪y=x, later called Superset.

.SUBBAG: BNN;

λλ Fill in examples of Bag-insert.

λλ Check examples of Bag-insert. Range is really Nonempty bags. Isolate the results
of insertion restricted to Singletons: call them Doubleton-bags.

λλ Fill in examples of Bag-intersect.

λλ Fill in examples of Set-insert.

λλ Check examples of Set-insert. Range is always Nonempty sets.
Define λ (x,S) Set-insert(x,S)=S; i.e., set membership.
Define Doubleton sets.

λλ Fill in examples of Bag-delete.

λλ Fill in examples of Bag-difference.

λλ Check examples of Bag-intersect. 
Define λ (x,y) x∩y=(); i.e. disjoint bags.

λλ Fill in examples of Set-intersect. 

λλ Check examples of Set-intersect. 
Define λ (x,y) x∩y=x; i.e., subset.
Also define disjoint sets: λ (x,y) x∩y=α{α}.

.SUPERBAG: BNN;

λλ Fill in examples of List-intersect.


λλ Fill in examples  of Equal. Very difficult to  find examples; this
led to:

λλ Fill  in generalizations of Equal.  Define
"Same-size", "Equal-CARs", and some losers.

λλ Fill in examples of Same-size.

λλ Apply an Algorithm for  Canonize to the args Same-size and  Equal.
AM eventually synthesizes the canonizing function "Size".  AM defines
the set of canonical structures: bags of T's; this later gets renamed
as "Numbers".

.NUMTASK: BNN;

λλ Restrict  the  domain/range  of Bag-union.    A new  operation  is
defined,  Number-union,  with domain/range  entry  <Number Number  α→
Bag>.

λλ Fill in examples of Number-union.  Many found. 

λλ Check  the domain/range of  Number-union.   Range is `Number'.
This operation is renamed "Add2".

λλ Restrict the domain/range of  Bag-intersect to Numbers. Renamed
"Minimum".

λλ Restrict  the domain/range  of Bag-delete  to Numbers.   Renamed
"SUB1".

λλ  Restrict  the  domain/range  of  Bag-insert  to  Numbers.     
AM calls the new operation "Number-insert". Its
domain/range entry is <Anything Number α→ Bag>.

λλ  Check  the  domain/range  of  Number-insert.  This  doesn't  lead
anywhere.

λλ  Restrict  the  domain/range of  Bag-difference  to  Numbers. This
becomes "Subtract".

λλ Fill  in examples  of Subtract.  This leads to defining the relation LEQ 
(⊗6≤⊗*).$$
If  a
larger number is "subtracted" from a smaller, the result is zero.  AM
explicitly  defines the set  of ordered pairs of  numbers having zero
"difference". <x,y> is in that set iff x is less than or equal to y. $

λλ Fill in examples of LEQ. Many found.

λλ Check examples of LEQ. 

λλ Apply algorithm of Coalesce to LEQ.  LEQ(x,x) is Constant-True.

λλ    Fill    in     examples    of     Parallel-join2.    Included     is
Parallel-join2(Bags,Bags,Proj2),  which is renamed "TIMES",
and
Parallel-join2(Structures,Structures,Proj1), a generalized
Union operation renamed "G-Union", and a bunch of losers.

.TIMTASK: BNN;

.Z7←BNN+12;

.OMITT: Z7;

λλ-- ⊗6{Z7}.⊗*  Fill  in  and check  examples  of  the  operations just
created.

.BNN←Z7;

λλ Fill in examples of Coalesce.  Created: Self-Compose, Self-Insert,
Self-Delete, Self-Add,  Self-Times, Self-Union, etc.   Also:
Coa-repeat2, Coa-join2, etc.

λλ Fill in examples of Self-Delete. Many found.

λλ  Check examples  of  Self-Delete.  Self-Delete is just Identity-op.

λλ Fill in examples of Self-Member. No positive examples found.

λλ  Check examples  of  Self-Member.  Self-member is just Constant-False.

λλ Fill in examples of Self-Add. Many found. User renames this "Doubling".

λλ Check examples of Coalesce. Confirmed.

λλ Check examples of Add2. Confirmed.

λλ Fill in examples of Self-Times. 
Many found. 
Renamed "Squaring" by the user.

λλ  Fill  in  examples   of  Self-Compose. 
Defined Squaring-o-Squaring.
Created Add-o-Add (two versions: Add21 which is
λ  (x,y,z)  (x+y)+z,  and Add22  which  is  x+(y+z)).  Similarly, two
versions of Times-o-Times and of
Compose-o-Compose. 

λλ Fill in examples of Add21. (x+y)+z. Many are found.

λλ Fill in examples of Add22. x+(y+z). Again many are found.

λλ Check examples of Squaring. Confirmed. 

λλ Check examples of Add22.   Add21 and Add22 appear  equivalent. But
first:

λλ Check examples of Add21.  Add21 and Add22 still appear equivalent.
Merge them.  So the proper argument for a generalized "Add" operation
is a Bag.

λλ Apply algorithm for Invert to argument `Add'. Define Inv-add(x) as
the set of all bags of numbers (>0) whose sum is x. Also denoted Add-1-(x).

λλ Fill in examples of TIMES21. (xy)z. Many are found.

λλ Fill in examples of TIMES22. x(yz). Again many are found.

λλ Check examples of TIMES22.  TIMES21 and TIMES22 may be equivalent.

λλ Check  examples of  TIMES21.   TIMES21  and TIMES22  still  appear
equivalent. Merge  them.   So the proper  argument for  a generalized
"TIMES" operation  is a Bag. Set up an analogy between TIMES and ADD,
because  of  this  fact.   

λλ   Apply  algorithm   for  Invert   to  argument   `TIMES'.  Define
Inv-TIMES(x) as the set of all bags  of numbers (>1) whose product is x.  
Analogic to Inv-Add.

λλ   Fill    in   examples   of    Parallel-replace2.       Included   are
Parallel-replace2(Bags,Bags,Proj2) (called MR2-BBP2), and many losers.

.Z7←BNN+16;

λλ-- ⊗6{Z7}.⊗*  Fill  in  and check  examples  of  the operations  just
created.

.BNN←Z7;

λλ Fill  in examples  of Compose.   So easy that AM creates Int-Compose.

λλ Fill in examples  of Int-Compose.  The  two chosen operations  G,H
must be  such that ran(H)⊗6ε⊗*dom(G),  and ran(G)⊗6ε⊗*dom(H);  both G
and  H must  be interesting.
Create G-Union-o-MR2-BBP2,$$ an alternate
derivation of  the operation of  multiplication. $
Insert-o-Delete, Times-o-Squaring, etc.

.Z7←BNN+18;

λλ-- ⊗6{Z7}.⊗* Fill  in and  check examples  of  the compositions  just
created.   Notice that  G-Union-o-MR2-BBP2 is just
TIMES.

.BNN←Z7;

λλ    Fill    in     examples    of    Coa-repeat2.    Among    them:
Coa-repeat2(Bags-of-Numbers, Add2)     [multiplication again!],
Coa-repeat2(Bags-of-Numbers, Times) [exponentiation],  
Coa-repeat2(Structures, Proj1) [CAR],
Coa-repeat2(Structures, Proj2) [Last-element-of], etc.

λλ Check the examples of Coa-repeat2. All confirmed.

λλ Apply algorithms for Invert to `Doubling'.
The result is called "Halving" by the user.
AM then defines "Evens".

λλ Fill in examples of Self-Insert.

λλ  Check examples of  Self-Insert. Nothing special  found. 

λλ Fill in examples of  Coa-repeat2-Add2. 

λλ Check examples of  Coa-repeat2-Add2. It's the same as TIMES.

λλ   Apply  algorithm   for  Invert   to  argument   `Squaring'.  Define
"Square-root".

λλ Fill in examples of Square-root. Some found, but very inefficiently.

λλ Fill in new algorithms for Square-root. Had to ask user for a good one.

λλ Check examples of Square-root. Define the set of numbers
"Perfect-squares".

λλ Fill in examples of Coa-repeat2-Times. This is exponentiation.

λλ Check  examples  of Coa-repeat2-Times.  Nothing  special  noticed,
unfortunately.

λλ Fill in examples of Inv-TIMES.
Many found, but inefficiently.

λλ Fill in new algorithms for Inv-TIMES. Obtained opaquely from the user.

λλ Check examples of Inv-TIMES. This task suggests the next one:

λλ Compose G-Union with  Inv-TIMES. Good domain/range. Renamed
"Divisors".

λλ Fill in examples of Divisors. Many found, but not very efficiently.

λλ Fill in new algorithms for Divisors. Obtained from the user.

λλ Fill in examples of Perfect-squares. Many found.

λλ Fill in specializations of TIMES. 
Times1(x)⊗6≡⊗*1⊗8#⊗*x, Times2(x)⊗6≡⊗*2x,
Times-sq is TIMES with its domain restricted to bags of perfect squares, Times-ev
takes only even arguments,
Times-to-evens requires that the result be even, Times-to-sq, ...

λλ Check examples of Divisors.
Define 0-Div, 1-Div, 2-Div, and 3-Div, the sets of numbers
whose Divisors value is the empty set, a singleton, a doubleton, and a tripleton,
respectively. 

λλ Fill in examples of 1-Div. 
Only one example found: "1". 
Lower 1-Div.Worth.

λλ Fill in examples of 0-Div.
None found. Lower the worth of this concept.

λλ Fill in examples of 2-Div.
A nice number are found. Raise 2-Div.Worth.

λλ Check examples of 2-Div. All confirmed,
but no pattern noticed.

λλ Fill in examples of 3-Div. A nice number found.

λλ Check examples of 3-Div. All confirmed. All are perfect squares.

λλ Restrict Square-root to numbers which are in 3-Div. Call this Root3.

λλ Fill in examples of Root3. Many found.

λλ Check examples of Root3. All confirmed. All are in 2-Div. 
Raise their worths.

λλ Restrict Squaring to 2-divs. Call the result Square2.

λλ Fill in examples of Square2. Many found.

λλ Check the range of Square2. Always 3-Divs.
Conjecture: x has 2 divisors iff x↑2 has 3 divisors.

λλ Restrict Squaring to 3-Divs. Call the result Square3.

λλ Restrict Square-rooting to 2-Divs. Call the result Root2.

λλ Fill in examples of Square3. Many found.

λλ Compose Divisors-of and Square3. Call the result Div-Sq3.

λλ Fill in examples of Div-Sq3. Many found.

λλ Check examples of Div-Sq3.
All such examples are Same-size.

.Z7←BNN+8;

λλ-- ⊗6{Z7}.⊗* More confirmations and explorations of the above conjecture.
Gradually, all its ramifications lead to dead-ends (as far as AM is concerned).

.BNN←Z7;

λλ Fill in examples of Root2. None found.
Conjecture that there are none.

λλ Check examples of Inv-TIMES. Inv-TIMES always contains a singleton bag, and
always contains a bag of primes. 

λλ Restrict the range of Inv-TIMES to bags of primes. Call this
Prime-Times.

λλ Restrict the range of Inv-TIMES to singletons. Called
Single-Times.

λλ Fill in examples of Prime-times. Many found.

λλ Check examples of Prime-times. Always
a singleton set. 
User renames this conjecture "The unique factorization theorem".

λλ Fill in examples of Single-TIMES. Many found.

λλ Check examples of Single-TIMES. Always
a singleton set.
Single-TIMES is actually the same as Bag-insert!

λλ Fill in examples of Self-set-union. Many found.

λλ Check examples of Self-set-union. This operation is same as 
Identity.

λλ Fill in examples of Self-bag-union. Many found.

λλ Check examples of Self-bag-union. Confirmed. Nothing interesting noticed.

λλ Fill in examples of Inv-ADD.

λλ Check examples of Inv-ADD. Hordes of boring conjectures, so:

λλ Restrict the domain of Inv-ADD to primes (Inv-Add-primes), to evens
(Inv-Add-evens), to squares, etc.

λλ Fill in examples of Inv-add-primes. Many found.

λλ Check examples of Inv-add-primes. Confirmed, but nothing special noticed.

λλ Fill in examples of Inv-add-evens. Many found.

λλ Check examples of Inv-add-evens. Always contains
a bag of primes.

λλ Restrict the range of Inv-Add-evens to bags of primes. Called
Prime-ADD.

λλ Restrict the range of Inv-ADD to singletons. Call that new operation
Single-ADD.

λλ Fill in examples of Prime-ADD. Many found.

λλ Check examples of Prime-ADD. Always
a nonempty set (of bags of primes). 
User renames this conjecture "Goldbach's conjecture".

λλ Fill in examples of Single-ADD. Many found.

λλ Check examples of Single-ADD. Always
a singleton set. This operation is the same as
Bag-insert and Single-TIMES.

λλ Restrict the range of Prime-ADD to singletons, by analogy to
Prime-TIMES.$$ In this case, AM is asking which numbers are uniquely representable
as the sum of two primes.$ Call the new operation Prime-ADD-SING.

λλ Fill in examples of Prime-ADD-SING. Many found.


λλ Check examples of  Prime-ADD-SING. Nothing special noticed.

λλ Fill in examples of Times-sq.$$ Recall that this is just TIMES restricted
to operate on perfect squares. $ Many examples found.

λλ Check domain/range of Times-sq. Is the range actually Perfect-squares?
Yes!

λλ Fill in examples of Times1. Recall that Times1(x)⊗6≡⊗*TIMES(1,x).

λλ Check examples of Times1. Apparently just a restriction of Identity.

λλ Check examples of Times-sq. Confirmed.

λλ Fill in examples of Times0. 

λλ Fill in examples of Times2.

λλ Check examples of Times2. Apparently the same as Doubling.
That is, x+x=2⊗8#⊗*x. Very important.
By analogy, define Ad2(x) as x+2.

λλ Fill in examples of Ad2.

λλ Check examples of Ad2. Nothing interesting noticed.

λλ Fill in specializations of Add. Among those created are:
Add0 (x+0), Add1, Add3, ADD-sq (addition restricted to perfect squares),
Add-ev (sum of even numbers), Add-pr (sum of primes), etc.

λλ Check examples of Times0. The value always seems to be 0.

λλ Fill in examples of Times-ev.$$ Recall that Times-ev is just like
TIMES restricted to operating on even numbers. $ Many examples found.

λλ Check examples of Times-ev. Apparently all the results are Evens.

λλ Fill in examples of Times-to-ev.$$ That is, consider bags of numbers
which multiply to give an even number. $ Many found.

λλ Fill in examples of Times-to-sq. Only a few found.

λλ Check examples of Times-to-sq. All arguments always seem to be squares.
Conjec: Times-to-sq is really the same as Times-sq. Merge the two.
This is a false conjecture, but did AM no harm.

λλ Check examples of Times-to-ev. The domain always contains an even number.

λλ Fill in examples of Self-Union.

λλ  Check examples of  Self-Union. 

λλ Fill in examples of SubSet.

λλ Check example  of SubSet.

λλ Fill in examples of SuperSet.

.ONCE TURN ON "{}";

λλ Check examples of SuperSet. 
Conjec: Subset(x,y) iff Superset(y,x). Important.

.SUBTIE:  BNN;  

λλ Fill in examples of Compose-o-Compose-1. AM creates
some explosive combination  (e.g., 
(Compose-o-Compose)-o-(Compose-o-Compose)-o-(Compose-o-Compose)),
some poor ones (e.g., Square-o-Count-o-ADD-1-),
and even a few -- very few -- winners (e.g., SUB1-o-Count-o-Self-Insert).

λλ Check examples of Compose-o-Compose-1.

λλ Fill in examples of Compose-o-Compose-2.$$ Recall that the difference
between this operation and the last one is merely in the order of the composing:
F-o-(G-o-H) versus (F-o-G)-o-H. $ 
AM recreates many of the previous tasks' operations.

λλ Check examples of Compose-o-Compose-2. Nothing noticed yet$$ Later on,
AM will use these new operations to discover the associativity of Compose. $.

.Z7←BNN+21;

λλ-- ⊗6{Z7}.⊗*  Fill  in  and check  examples  of  the  losing compositions just
created.

.BNN←Z7;

λλ Fill in examples of Add-sq (i.e., sum of squares).

λλ Check domain/range entries of Add-sq. The range is not always perfect squares.
Define Add-sq-sq(x,y),
which is True iff x and y are perfect squares and their sum is a perfect
square as well.

λλ Fill in examples of Add-pr; i.e., addition of primes.

λλ Check Domain/range entries of Add-pr. AM defines the set of pairs of primes
whose sum is also a prime. This is a bizarre derivation of prime pairs.

.E

.  SKIP TO COLUMN 1;  SSSEC(Two-Dimensional Behavior Graph)


On the next two pages is a graph of the same "best run" which AM executed.
The nodes are concepts, and the links are actions which AM performed.
Labels on the links indicate when each action was taken, so the reader
may observe how AM jumped around. It should also easy to perceive from the graph
which paths of development were abandoned, which concepts ignored, and which ones
concentrated upon. These are precisely the features of AM's behavior which are
awkward to infer from a simple linear trace (as in the previous section).

In more detail, here is how to read the graph:
Each node is a concept. 
To save space, these names are often highly abbreviated. For example,
"x0" is used in place of "TIMES-0". 

Each concept name is surrounded by from zero to four numbers:

.BEGIN NOFILL; PREFACE 0; INDENT 20; SELECT 6;
318      288
FROBNATION
310      291
.E
The upper right number indicates the task number (see last section) during which
examples of this concept were filled in. The lower right number tells when they
were checked. 
The upper left number indicates when the Domain/range facet of that
concept was modified. Finally, the lower left number is the task number during
which some new Algorithms for that concept were obtained.
A number in parentheses  indicates that the task with that number was a total
failure.

Because of the limited space, it was decided that if a concept were ever
renamed by the user, then only that newer, mnemonic name would be given in
the diagram. Thus there is an arrow from "Coalesce" to "⊗4Square⊗*", an operation
originally called "Self-Times" by AM.

Sometimes, a concept will have under it a note of the form ⊗6≡GROK⊗1.
This simply means that AM eventually discovered that the concept was
equivalent to the already-known concept "Grok", and probably forgot about
this one (merged it into the one it already knew about).
The "trail" of discovery may pick up again at that pre-existing concept.
A node written as ⊗6=GROK⊗* means that the concept was really the same as
"Grok", but AM never investigated it enough to notice this.

Each node may have an arrow leading into it, and any number of arrows emanating
from it. The arrows indicate the creation of new concepts. Thus an arrow leading
to concept "Frobnate" indicates how that concept was created. An arrow directed
away from Frobnate points to a concept created as, e.g., a specialization or an
example of Frobnate. No arrowheads are 
in practice necessary:
all arrows are directed ⊗4downwards⊗*.

The arrows may be labelled, indicating precisely what they represent (e.g.,
composition, restriction) and what the task number was when they occurred.
For space reasons, the following convention has proven necessary: 
if an arrow emanating from C is un-numbered, 
it is assumed to have occurred at the same time
as the arrow to its immediate left which also points from C;
if all the arrows
emanating from C have no number, than all their times of occurrence are
assumed to be the ⊗4lower right⊗*$$ 
This is often true because many concepts are created
while checking examples of some known concept.
$ number of C.  
Finally, if C has no lower right number, the arrow is assumed to have the
value of the upper right number of C.

An unlabelled arrow is assumed to be an act of
Specialization or the creation of
an Example.$$ 
It should be clear in each context which is happening. If not, refer to the
short trace in the preceding section, and    look up the appropriate task number.
$ Labels, when  they do occur, are given in capitals and small letters;
concept names (nodes) are by contrast in all capitals.

.ONCE TURN ON "{}";

All the numbers correspond to those given to the tasks
in the task-by-task traces presented
in the last section (p. {[3]SHORTASKP}) and in Appendix {[2]TRACES} 
(p. {[3]TRAT}).

The first part of this graph (presented below) contains static  structural
(and ultimately numerical) concepts which were studied by AM:

.SKIP 19;


The rest of the graph (presented on the next page) deals with activities which were
investigated:

.SKIP TO COLUMN 1; 

.PBG: PAGE;

( Paste Concept Development Behavior Graph here. )


.  SKIP TO COLUMN 1; SSSEC(AM as a Computer Program)

When viewed as a large LISP program, there is very little of interest about
AM. There are the usual battery of customized functions (e.g., a conditional
PRINT function), the storage hacks (special emergency garbage collection
routines, which know which facets are expendible), the time hacks
(omnisciently arrange clauses in a conjunction so that the one most likely
to fail will come first), and the bugs (if the user renames a concept
while it's the current one being worked on, 
there is a 5% chance of AM entering an infinite loop).

Below are listed a few parameters of the system, although I doubt that
they hold any theoretical significance. The reader may be curious about how
big AM, how long it takes to execute, etc.



Machine: SUMEX, PDP-10, KI-10 uniprocessor, 256k core memory.

Language: Interlisp, January '75 release, 
which occupies 140k of the total 256k, 
but which provides a surplus  "shadow space"
of 256k additional words available for holding compiled code.

AM support code: 
200 compiled (not block-compiled) utility routines, control routines, etc.
They occupy roughly 100k, but all are pushed into the shadow space.

AM itself: 115 concepts, each occupying about .7k
(about two typed pages, when Pretty-printed with indentation).
Facet/entries stored as
property/value on the property list of atoms whose names are concepts' names.$$
Snazzy feature:
Executable entries on facets  (e.g., an entry on Union.Alg) are stored uncompiled
until the first time they are actually called on, at which time they are compiled
and then executed. $
Each concept has about 8 facets filled in.

Heuristics are tacked onto the facets of the concepts. The more general
the concept, the more heuristic rules it has attached to it.$$
This was not done consciously,
and may or may not hold some theoretical significance. $
"Any-concept" has 121 rules; "Active concept" has 24; "Coalesce" has 7;
"Set-Insertion" has none. There are 250 heuristic rules in all, divided into
4 flavors (Fillin, Check, Suggest, Interestingness).
Although the mean number of rules is therefore only about 2.2 (i.e., 
less than 1 of each flavor) per
concept,
the standard deviation of this is a whopping 127.4. The average number of heuristics
(of a given flavor)
encountered rippling upward from a randomly-chosen concept C (along the
network of generalization links) is about 35, even though  the mean path length
is only about 4.$$ If the heuristics were homogeneously distributed among the
concepts, the number of heuristics (of a given type) along a typical path 
of length 4 would only be about 2, not 35.  If all the heuristics were tacked
onto Anything and Any-concept, the number encountered in any path would be
75, not 35. $ 

The total number of jobs executed in a typical run (from scratch) is about 200.
The run ends because of space problems, but AM's performance begins to degrade
near the end anyway.

"Final" state of AM: 300 concepts, each occupying about 1k. Many are swapped out
onto disk. 
Number of winning concepts discovered: 25 (estimated).
Number of acceptable concepts defined: 100 (est.).$$ For a list of most of the
`winners' and `acceptables', see the final section in Appendix {[2]ALLCON},
page {[3]NEWCLIST}. $
Number of losing concepts unfortunately worked on: 60 (est.).
The original 115 concepts have grown to an average size of 2k.
Each concept has about 11 facets filled in.

About 30 seconds of cpu time were allocated to each task, on the average,
but the task typically used only about 18 seconds before quitting.
Total CPU time for a run is about 1 hour. Total cpu time consumed by this
research project was about 500 cpu hours.

Real time: about 1 minute per task, 2 hours per run. 
The idea for AM was formulated in the Fall of 1974, and AM was coded in the
summer of 1975.
Total time consumed
by this project to date has been about 2500 man-hours: 700 for planning,
500 for coding, 600 for modifying and debugging and experimenting,
and 700 for writing this thesis.

.SKIP 1

.SSEC(Experiments with AM)

.EXPT: SECNUM;

.EXPTSSEC: SSECNUM;

.EXPTPAGE: PAGE;

Now we've  described the activities  AM carried  out during its  best
run.   AM was working by itself, and each  time executed the top task
on the  agenda.   It  received no  help  from the  user, and  all its
concepts' Intuitions facets had been removed.

One valuable aspect of  AM is  that it  is amenable  to many  kind of
interesting experiments.  Although AM is too ⊗4ad hoc⊗* for numerical
results to have much significance, the qualitative results perhaps do
have  some   valid  things  to  say  about   research  in  elementary
mathematics, about  automating  research,  and  at  least  about  the
efficacy of various parts of AM's design.

This section will explain  what it means to perform  an experiment on
AM,  what kinds  of experiments  are imaginable,  which of  those are
feasible, and  finally will  describe the many experiments which  were
performed on AM.

By modifying AM in various ways, its behavior can be altered, and the
⊗4quality⊗*  of  its  behavior  will change  as  well.  As  a drastic
example, one experiment involved forcing AM to select the next task to work on
⊗4randomly⊗* from the agenda, not the top task each time. Needless to say,
the performance was very different from usual.

By careful planning, each experiment can tell us something new  about
AM: how  valuable a  certain piece  of it  is, how  robust a  certain
scheme  really is, etc.   The results of these  experiments would then
have  something  to   contribute  to  a   discussion  of  the   "real
intelligence"  of  AM  (e.g.,  what features  were  superfluous),  and
contribute to  the design of the "next" AM-like system.  Generalizing
from those  results,  one might  suggest  some hypotheses  about  the
larger task of automated math research.

Let's cover the different  ⊗4kinds⊗* of experiments one could perform
on AM:

(i) Remove individual  concept modules,  and/or individual  heuristic
rules. Then  examine how  AM's performance  is degraded.   AM  should
operate even  if most of its heuristic rules  and most of its concept
modules were excised.
If the remaining  fragment of AM is too small, however,
it may  not be able to find anything interesting to do. In fact, this
situation was actually encountered experimentally, when the first few
partially complete concepts were inserted. If only a little bit of AM
is removed, the remainder  will in fact  keep operating without  this
"uninteresting collapse".   The converse situation should  also hold:
although  still functional  with any  concept module  unplugged, AM's
performance ⊗4should⊗*  be  noticeably degraded.  That is,  while  not
indispensable, each concept should nontrivially help the others.  The
same  holds for each individual heuristic  rule.  
When a piece of AM is removed,
which concepts does
AM then "miss" discovering?  Is the  removed concept/heuristic  later
discovered  anyway  by those  which  are left  in  AM?   This  should
indicate the importance  of each kind  of concept and  rule which  AM
starts with.

(ii) Vary  the relative  weights given  to features  by the  criteria
which  judge aesthetics, interestingness,  worth, utility,  etc.  See
how important each factor is in directing AM along successful routes.
In other  words, vary the  little numbers  in the formulae  (both the
global priority-assigning formula and  the local  
reason-rating
ones inside  heuristic rules).   One
important result will be  some idea of the robustness  or "toughness"
of the numeric weighting  factors. If the system easily collapses, it
was too finely tuned to begin with.

(iii) Add several new concept modules (including new heuristics
relevant to them) and
see if AM can  work in some unanticipated field  of mathematics (like
graph theory or calculus or plane geometry).  Do earlier achievements
-- concepts and conjectures AM synthesized already  -- have  any
impact in  the new domain?  Are some specialized heuristics  from the
first  domain totally wrong  here? Do all the  old general heuristics
still  hold  here?  Are  they  sufficient,  or   are  some  "general"
heuristics  needed here which  weren't needed  before? Does  AM "slow
down" as more and more concepts get introduced?

(iv) Try to have AM 
develop nonmathematical theories  (like elementary physics, or program
verification).  This  might  require  limiting  AM's  freedom to
"ignore  a  given  body  of  data  and  move  on  to  something  more
interesting". The exploration of  very non-formalizable fields (e.g.,
politics) might require much more than a small augmentation of AM's
base of concepts. For some such domains, the "Intuitions" scheme, which had to
be abandoned for math, might prove valid and valuable.

(v) Add several  new concepts dealing with  proof, and of course  add
all the associated heuristic rules. Such rules would advise AM on the
fine points of  using various techniques  of proof/disproof: when  to
use them, what to try next based on why the last attempt failed, etc.
See if the ⊗4kinds⊗* of discoveries AM makes are increased.

Just
prior to the writing of this document,
several experiments (of  types i, ii, and
iii above$$  experiments  of type  (iv)  weren't tried and are left as
"open problems", as invitations for future research efforts.
Experiment (v)  will probably be carried out this year (1976). $) were
set up and performed on  AM. We're now ready to examine each  of them
in detail.  The following points are covered for each experiment:

.BN

λλ How was it thought of? 

λλ What will be gained by  it?  What would be the implications of the
various possible outcomes?

λλ How was the experiment set up? What preparations/modifications had
to be made? How much time (man-hours) did it take?

λλ What happened?  How did AM's  behavior change? Was  this expected?
Analysis.

λλ  What was learned from  this experiment? Can  we conclude anything
which suggests new  experiments (e.g.,  use a better  machine, a  new
domain) or which bears on a  more general problem that AM faced (e.g.,
a new way to teach math, a new idea about doing math research)?

.E

. SSSEC(Must the Worth numbers be finely tuned?)

.EX3PAGE: PAGE;

Each of the 115 initial concepts 
has  supplied to it (by  the author) a number  between 0
and 1000,  stored as its Worth facet,  which is supposed to represent
the overall  value of  the concept.  "Compose" has  a higher  initial
Worth than "Structure-delete", which is  higher than "Equality"$$ As AM
progresses, it notices something interesting about Equality every now
and then, and pushes its  Worth value upwards.   $.

Frequently, the priority of a task involving C depends on the overall
Worth of C. 
How  sensitive is
AM's  behavior to  the initial  settings of  the Worth  facets?   How
finely tuned must these initial Worth values be?

This experiment was thought of because of the `brittleness' of many other
AI systems, the amount of fine tuning needed to elicit coherent behavior.
For example, see the discussion of limitations of PUP6, in [Lenat 75b].
The author believed that AM was very resilient in this regard, and that
a demonstration of that fact would increase credibility in the power of the ideas
which AM embodies.


To test this, a simple experiment was performed. Just before starting
AM, the mean value of all concepts' Worth values was computed. It turned out to be
roughly 200. Then  each concept had its Worth reset to  the value 200.$$
The initial spread of values was from 100 to 600. $
This
was done "by hand", by  the author, in a  matter of seconds.  AM  was
then started and run as if  there were nothing amiss, and its behavior
was watched carefully.

What happened?  By and large, the same major discoveries were made --
and missed -- as usual, in  the same order as usual.  But  whereas AM
proceeded fairly smoothly before, with little superfluous activity, it
now wandered quite blindly  for long periods  of time, especially  at
the  very beginning.  Once  AM "hooked  into"  a line  of  productive
development,  it  followed  it  just  as  always, with  no  noticeable
additional wanderings.  As one of  these lines  of developments  died
out, AM would wander around again, until the next one was begun.

It took roughly three times as long for each major discovery to occur
as  normal.  This "delay"  got shorter  and  shorter as  AM developed
further.   In  each  case,  the  tasks preceding  the  discovery  and
following  it were pretty  much the  same as  normal; only  the tasks
"between" two periods of development were different -- and much  more
numerous.  The  precise  numbers  involved  would  probably  be  more
misleading than  helpful, so they will not  be given$$ Any reader who
wishes to  perform  this experiment  can  simply say  [MAPC  CONCEPTS
'(LAMBDA  (c) (SETB  c WORTH  200] to  Interlisp, just  before typing
(START) to begin AM. $.

The  reader may be interested to learn  that the Worth values of many
of the  concepts -- and  most of the  new concepts  -- ended up  very
close  to the same  values that  they achieved  in the  original run.
Overrated concepts were  investigated and  proved boring;  underrated
concepts had  to wait longer  for their  chances, but then quickly  proved
interesting and had their Worth facets boosted.

The conclusion I  draw from this change in behavior is that the Worth
facets are useful for making blind decisions -- where AM  must choose
based  only on  the overall  worths of  the various  concepts in  its
repertoire.  Whenever  a specific  reason  existed, it  was  far more
influential than  the "erroneous"  Worth values.   The close,  blind,
random decisions occur  between long bursts of specific-reason-driven
periods of creative work.$$ Incidentally, GPS behaved just this same way.
See, e.g., [Newell&Simon 72]. $

The general answer, then, is ⊗4No⊗*, the initial settings of the Worth
values are not crucial. Guessing reasonable initial values for them is
merely a time-saving device.
This suggests
an interesting research problem: what impact does
the quality of initial starting values have on humans? Give several bright
undergraduate math majors the same set of objects and operators to play with,
but tell some of them (i) nothing, and some of them (ii) a certain few pieces of
the system are very promising, (iii) 
emphasize a different subset of the objects and operators.
How does "misinformation" impede the humans? How about no information?
Have them give verbal protocols about where they are focussing their attention,
and why.

Albeit at a  nontrivial cost, the  Worth facets did manage  to correct
themselves  by the  end of  a long$$  A couple  cpu hours,  about a
thousand tasks  total selected from  the agenda  $ run.   What  would
happen  if the  Worth  facets of  those 115  concepts  were not  only
initialized  to 200, but were  held fixed at 200  for the duration of
the run?

In this  case, the  delay still subsided with time.  That is,  AM
still got more and more "back to normal" as it progressed onward. The reason
is  because  AM's  later  work  dealt  with  concepts  like   Primes,
Square-root, etc., which were so far removed  from the initial base of
concepts that the initial concepts' Worths were of little consequence.

Even  more drastically, we  could force all  the Worth  facets of all
concepts -- even newly-created  ones -- to be  kept at the value  200
forever. In this case,  AM's behavior doesn't completely disintegrate,
but  that delay factor  actually increases with  time: apparently, AM
begins to suffer from the exponential growth of "things to do" as its
repertoire  of  concepts  grows  linearly.   Its  purposiveness,  its
directionality depends on "focus of attention" more and more, and  if
that feature is removed,  AM loses much of its rationality.   A factor
of 5  delay doesn't sound that bad  "efficiency-wise", but the actual
apparent  behavior of  AM  is  as  staccato  bursts  of  development,
followed  by wild  leaps  to  unrelated concepts.  AM  no longer  can
"permanently" record its interest in a certain concept.

So  we conclude that the  Worth facets are (i)  not finely tuned, yet
(ii) provide important  global information about the  relative values
of  concepts.   If  the  Worth facets  are  completely disabled,  the
rationality of AM's behavior hangs on the slender thread of "focus of
attention".

. SSSEC(How finely tuned is the Agenda?)

.RTEXSSSEC: SSSECNUM;

.COMMENT This assumes that expt number = sssec number;

.RTEXNO: SSSECNUM;

The top few candidates  on the agenda always appear  to be reasonable
(to me).   If I work with  the system, guiding it, I  can cause it to
make a few discoveries it wouldn't otherwise make, and I can cause it
to make its typical ones much faster  (about a factor of 2). Thus the
⊗4very⊗* top task is not always the best.

If  AM randomly selects one of  the top 20 or  so tasks on the agenda
each time, what  will happen to  its behavior? Will it  disintegrate,
slow down by a factor of 10, slow down slightly,...?

This experiment required only a few seconds to set up, but demanded a
familiarity with  the  LISP  functions which  make  up  AM's  control
structure.   At  a  certain  point,  AM asks  for  Best-task(Agenda).
Typically,  the LISP function  Best-task is  defined as CAR  -- i.e.,
pick the first  member from the  list of tasks.   What  I did was  to
redefine Best-task as  a function which randomly selected  n from the
set  {1,2,...,20},  and  then  returned  the  n⊗Ath⊗*  member of  the
job-list.

If you watch the top job on the agenda, it will  take about 10 cycles
until  AM chooses  it.   And yet  there are  many  good, interesting,
worthwhile jobs sprinkled  among the top  20 on the  agenda, so  AM's
performance is cut  by merely a factor of  3, as far as cpu  time per
given major  discovery.  Part of this  better-than-20 behavior is due
to the fact  that the 18⊗Ath⊗*  best task had  a much lower  priority
rating than the  top few, hence was allocated much  less cpu time for
its  quantum  than the  top  task would  have received.    Whether it
succeeded or  failed, it  used up  very little  time.   Since AM  was
frequently  working on a  low-value task,  it was unwilling  to spend
much time or space on it. So the mean time allotted per task fell  to
about 15 seconds (from the typical 30  secs). Thus, the "losers" were
dealt  with quickly,  so the  detriment  to cpu-time  performance was
softened.

Yet AM is much less rational in its sequencing of tasks. A topic will
be dropped  right in the middle,  for a dozen cycles,  then picked up
again.   Often a  "good" task will  be chosen, having  reasons all of
which were true  10 cycles ago --  and which are clearly  superior to
those  of the last  10 tasks. This  is what  is so annoying  to human
onlookers.

To carry this  investigation further, another  experiment was  carried
out. AM was forced to alternate between choosing  the top task on the
agenda,  and a randomly-chosen one.   Although its  rate of discovery
was cut by less than half, its behavior was almost as  distasteful to
the user as in the last (always-random) experiment.

.ALTEXP: PAGE;

⊗2↓_Conclusion_↓⊗*: Picking (on the  average) the 10th-best candidate
impedes progress  by a factor less than 10 (about a factor of 3), but
it dramatically  degrades the  "sensibleness" of  AM's behavior,  the
continuity  of its actions.   Humans  place a  big value  on absolute
sensibleness, and believe that doing something silly 50% of the  time
is ⊗4↓_much_↓⊗*  worse than half  as productive  as always doing  the
next most logical task.

Corollary: Having  20 multi-processors simultaneously execute the top
20 jobs will  increase the rate  of "big" discoveries,  but not by  a
full factor of 20.

Another experiment in this same vein was done, one which was designed
to  be far more crippling to AM.  Be-threshhold was held at 0 always,
so ⊗4any⊗*  task which  ever got  proposed  was kept  forever on  the
agenda, no matter  how low its priority.   The Best-task function was
modified so it randomly selected any member of the list of jobs. As a
final insult, the Worth  facets of all the concepts  were initialized
to 200 before starting AM.

Result:  Many "explosive" tasks  were chosen,  and the number  of new
concepts increased rapidly.   As  expected, most of  these were  real
"losers".  There  seemed no rationality to AM's  sequence of actions,
and  it was  quite  boring to  watch it  floundering so.  The typical
length of the agenda was about 500, and AM's performance was "slowed"
by at least  a couple orders of magnitude.  A more subjective measure
of its "intelligence" would say that it totally collapsed under  this
random scheme.

↓_⊗2Conclusion:⊗*_↓  Having   an  unlimited   number  of   processors
simultaneously execute all  the jobs on the agenda would increase the
rate at which AM made  big discoveries, at an ever accelerating  pace
(since the length of the agenda would grow exponentially).

Having a uniprocessor ⊗4simulate⊗*  such parallel processing would be
a losing  idea, however. The truly "intelligent" behavior AM exhibits
is its plausible sequencing of tasks.

. SSSEC(How valuable is tacking reasons onto each taskα?)

Let's dig inside  the agenda  scheme now.   One idea I've repeatedly emphasized
is the attaching of reasons to  the tasks on the agenda,
and using  those reasons  and their  ratings to  compute the  overall
priority value  assigned to each  task.  An  experiment was  done to
ascertain  the amount  of intelligence that  was emanating  from that
idea.

The global  formula  assigning  a  priority value  to  each  job  was
modified. We let it  still be a function of the  reasons for the job,
but we "trivialized" it: the priority of a job was computed as simply
the number of reasons it  has (normalized by multiplying by  100, and
cut-off if over 1000).

This raised the  new question of what to do  if several jobs all have
the same  priority.   In that case, I had AM execute  them  in
stack-order (most  recent first)$$ Why?  Because (i) it  sounds right
intuitively to me, (ii) this is akin to human focus of attention, and
mainly because  (iii)  this is  what AM  did  anyway, with  no  extra
modification. $.

Result:  I  secretly  expected  that  this  wouldn't  make  too  much
difference  on AM's  apparent level  of directionality, but  such was
definitely not the case.   While AM opened by doing tasks  which were
far more interesting and  daring than usual (e.g., filling in various
Coalescings right away),  it soon  became obvious that  AM was  being
swayed by hitherto trivial coding decisions.   Whole classes of tasks
--  like Checking Examples  of C --  were never  chosen, because they
only had one or two reasons supporting them.  Previously, one  or two
good reasons  were sufficient. Now,  tasks with several  poor reasons
were  rising to the  top and being  worked on. Even  the LIFO (stack)
policy for resolving ties didn't keep AM's attention focussed.

⊗2↓_Conclusion:_↓⊗* Unless a conscious effort is made  to ensure that
each  reason really will  carry roughly  an equal amount  of semantic
impact (charge, weight), it is not acceptable merely to choose  tasks
on  the  basis of  how  many  reasons  they possess.  Even  in  those
constricted  equal-weight  cases,  the  similarities between  reasons
supporting a task should be taken into account.

Another experiment, not yet performed, will pin down the value of this
rule-attaching idea even more precisely. A threshhold value -- say 400 --
will be fixed. Any reason whose rating is above that threshhold will be
called a ⊗4good⊗* reason, and every other reason will be a minor reason.
Then tasks will be ordered by the number of good reasons they possess,
and ties will be broken by the number of minor reasons.
Still another experiment would be to randomly pick any task with at least
one good reason.

. SSSEC(What if certain concepts are eliminated/added?)

Feeling  in  a  perverse mood  one  day,  I  eliminated  the  concept
"Equality" from AM, to see what it would then do.  Equality was a key
concept,  because  AM  discovered   Numbers  via  the  technique   of
generalizing  the  relation "Equality"  (exact  equality  of 2  given
structures,  at  all  internal  levels).   What  would  happen  if we
eliminate this  path?  Will  AM rederive  Equality?   Will it get  to
Cardinality via another route? Will it do some set-theoretic things?

Result:  Rather disappointing. AM  never did re-derive  Equality, nor
Cardinality.  It spent its time thrashing about with various  flavors
of data-structures (unordered vs.  ordered, multiple-elements allowed
or  not, etc.),  deriving large  quantities  of boring  results about
them. Very many composings and coalescings were done, but no exciting
new operations were produced.

It is  expected that  eliminating other,  less central  concepts than
Equality will  do less damage to AM's progress. The reader is invited
to try such experiments himself.

To  eliminate  a  concept,  like  equality,  one   need  merely  type
⊗4KILB(OBJ-EQUALITY$$ To find out  the precise PNAME of each concept,
just type ⊗4CONCEPTS⊗*.  $)⊗1 at the beginning of the session, before
typing ⊗4(START)⊗*.

An even  kinder type of  experiment would be  to ⊗4add⊗* a  few concepts.
One  such  experiment was  done: the  addition  of Cartesian-product.
This operation,  named  C-PROD, accepts  two  sets as  arguments  and
returns a third set as its  value: the Cartesian product of the first
two.

Result:  The only significant change in  AM's behavior was that TIMES
was discovered first as the restriction of  C-PROD to Canonical-Bags.
When it  soon was rediscovered in  a few other guises,  its Worth was
even higher than usual.  AM  spent even more time exploring  concepts
concerned with it, and deviated much less for quite a long time.

Synthesis of the above experiments: It appears  that AM may really be
more specialized than expected; AM may only be able to forge  ahead along one or two
main lines  of development  -- at  least if  we demand  it make  very
interesting,  well-known  discoveries  quite  frequently.    Removing
certain  key concepts  can be disastrous.  On the  other hand, adding
some carefully-chosen  new  ones  can  greatly  enhance   AM's
directionality (hence its apparent intelligence).

Conclusion:  In its current state,
AM is thus  seen to  be ⊗4minimally competent⊗*:  if any
knowledge is  removed,  it appears much less intelligent;  if any  is  added,  it
appears slightly smarter.

Suggestion for future research:
A hypothesis, which should be tested experimentally, is that the importance
of the presence of each individual 
concept  decreases as the number of -- and ⊗4depth⊗* of -- the 
synthesized concepts increase.
That is, any excision would eventually "heal over", given enough time.
The failure of  AM to verify this may be due to the relatively small amount of
development in toto (an hour of cpu time, a couple hundred new concepts, a few
levels deeper than the starting ones).

. SSSEC(What if certain heuristics are tampered with?)

The class  of  experiments 
described by this section's heading
should  prove entertaining,  but it  will
probably be difficult to learn from their results.

Why is this?  Some of the heuristics were added to correct a specific
problem; removing them would simply re-initiate that problem.  Others
were  never actually  used by  AM, so  their deletion  would have  no
effect.   If AM enlarged the range of what it worked on, their absence
might then be felt.

What good would these experiments be, then? We  might learn something
about the "redundancy of  reasoning chains". We'd stop AM just before
it made a big  discovery, remove the heuristic  rule it was about  to
use, and see if it ever makes that big discovery anyway, later on. If
not,  perhaps the  discarded rule  was very  important, or  there are
alternate rules which exist but  haven't been inserted in AM. If  the
same discovery is  made by an alternate route, does  that indicate an
unexpected  duplication of  heuristic knowledge?  If heuristic  H2 is
used now, instead of H1,  does that suggest a new meta-rule:  "if you
want to apply one of H1/H2 but can't, see if the other rule ⊗4can⊗* be
applied."?  Is that last sentence really a Meta-meta-rule?

Before this discussion  enters an infinite  loop, I'd better  extract
myself -- and the reader  -- by commenting that there may  be an idea
in  all this,  perhaps  of use  to whoever  writes  Meta-AM.   It was
decided not to carry out  a systematic series of experiments of  this
type until AM is much further developed in abilities.


. SKIP TO COLUMN 1; SSSEC(Can AM work in a new domainα: Plane Geometry?)

.QQ

A true strategy should be domain-independent.

-- Adams

.ESS

As McDermott  points  out [McDermott 76],  just labelling  a
bunch of  heuristics `Operation heuristics' doesn't suddenly make them
relevant to any operation; all it  does is give that impression to  a
human who  looks at  the code (or  a description of  it).   Since the
author  hoped that the  labelling really was fair,  an experiment was
done to test this.   Such an experiment would be a key  determiner of
how general AM is.

How  might one  demonstrate  that the  "Operation"  heuristics really
could be useful  or dealing  with any  operation, not  just the  ones
already in AM's initial base of concepts?

One  way would  be  to  pick a  new  domain,  and  see how  many  old
heuristics  contribute to -- and  how many new heuristics  have to be
added to elicit  -- some  sophisticated behavior in  that domain.  Of
course,  some new  primitive  concepts would  have  to be  introduced
(defined) to AM.

.GEOEX: PAGE;

Only  one experiment of this type was  attempted.  The author added a
new base  of concepts  to the  ones already  in AM.   Included  were:
Point, Line, Angle, Triangle, Equality of points/lines/angles/triangles.
These simple plane  geometry notions were  sufficiently removed  from
set-theoretic ones that those pre-existing specific concepts would be
totally  irrelevant; on the other  hand, the general  concepts -- the
ones with the heuristics attached -- would still be just as relevant:
Any-concept, Operation, Predicate, Structure, etc.

For each  new geometric  concept, the  only facet  filled in  was its
Definition.  For the new predicates and operators, their Domain/range
entries were also supplied.
No new heuristics were added to AM.

Results: fairly good behavior.   AM was able to find examples  of all
the  concepts defined, and  to use  the character  of the  results of
those examples searches to  determine intelligent courses of  action.
AM derived congruence and similarity of  triangles, and several other
well-known  simple  concepts.  An  unusual  result  was the  repeated
derivation of the idea  of "timberline". This  is a predicate on  two
triangles: Timberline(T1,T2) iff  T1 and T2 have a  common angle, and
the side opposite that angle in the two triangles are parallel:

.B0

		    ⊗1 ⊗*    A
			/\  
		       /  \
		      /    \
		     /      \		⊗3Timberline(ABC,ADE)⊗*
		    /	     \
		B  /∂∂∂∂∂∂∂∂∂∂\  C
		  /	       \
		 /		\
		/		 \
               D∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂E

.E

Since AM kept rederiving  this in new ways, it  seems surprising that
there is no very  common name for the concept. It could be that AM is
using techniques which humans don't -- at least, for geometry.

The only new bit of knowledge that came out of this experiment was  a
"use" for  Goldbach's conjecture:  any angle  (0-180 degrees) can  be
built  up (to  within 1  degree) as the  sum of  two angles  of prime
degrees (<180). This result is admittedly esoteric at best,  but is
nonetheless worth reporting.

The total effort expended on this  experiment was: a few months of
subconscious processing,  ten hours of designing the base of concepts
to insert,  ten hours inserting  and debugging  them. The whole  task
took about two days of real time.

The conclusion to be drawn is that heuristics really can be generally
useful; their  attachment  to  general-sounding concepts  is  not  an
illusion.$$
Or: it's a very good illusion! But note: if this phenomenon is repeatable
and useful, then (like Newtonian mechanics) it won't pragmatically 
matter whether it's only
an illusion.
$   The  implication  of  this  is  that  AM  can  be  grown
incrementally,  domain by domain.   Adding expertise in  a new domain
requires only the introduction of concepts local to that  domain; all
the very  general concepts --  and their heuristics --  already exist
and can be used with no change.


The  author feels  that  this result  can be  generalized: AM  can be
expanded in scope,  even to non-mathematical  fields of endeavor.  In
each  field, however, the  rankings of  the various  heuristics$$ the
numeric values  that 
should be returned by
the  local ratings  formulae  which are attached  to  the
heuristic rules. $  may shift  slightly. As the  domain
gets further  away from mathematics, various heuristics are important
which were ignorable  before (e.g., those  dealing with ethics),  and
some pure  math research-oriented  heuristics become  less applicable
("giving  up and  moving on  to another  topic" is not  an acceptable
response to the 15-puzzle, nor to a hostage situation).

Well, it sounds as if we've shifted our orientation from `Results' to
a subjective  evaluation of those results. Let's  start a new chapter
to legitimize this type of commentary.